postgresql will have some variable-length data types, storage are using varlena format (except cstring type), through the statement SELECT typname FROM pg_type WHERE typlen = -1 you can see all the data types using varlena format, such as the common text . json types.
Varlena structureVarlena structure
The varlena structure has a common definition format as follows.
struct varlena { char vl_len_[4]; /* Do not touch this field directly! */ char vl_dat[FLEXIBLE_ARRAY_MEMBER]; /* Data content is here */ };
Note that this is a generic format, varlena is also divided into many formats, each with a different definition. Before we use it, we need to convert it to its corresponding format according to its first byte:
The first byte is equal to 1000 0000, then it is varattrib_1b_e, which is used to store external data (explained in toast)
The highest bit of the first byte is equal to 1, and then the byte is not equal to 1000 0000, then it is varattrib_1b, which is used to store small data.
The highest bit of the first byte is equal to 0, then it is varattrib_4b, which can store data up to 1GB.
Varattrib_1b Type
Typed structure { uint8 va_header; char va_data[FLEXIBLE_ARRAY_MEMBER]; /* data starts here */ } varattrib_1b;The va_header has only 8 bits, the highest bit is the marker bit and has a value of 1. The remaining 7 bits represent the length of the data, so the varattrib_1b type is only used to store small data up to 127 bytes in length.
----------------------------------------- tag | length | ----------------------------------------- 1 bit | 7 bit | -----------------------------------------
Type varattrib_4b
typedef union { /* va_data stores uncompressed data */ struct { char va_data[FLEXIBLE_ARRAY_MEMBER]; } va_4byte. /* va_data stores data that has been compressed */ struct { uint32 va_header; /* va_data; /* va_data_stored_data uint32 va_rawsize; /* Size of raw data */ char va_data[FLEXIBLE_ARRAY_MEMBER]; } va_compressed; } } varattrib_4b.The first member of both structures, va_header, is of type uint32 and has the same format. The highest bit is a flag bit with a value of 0. The second bit indicates whether the data is uncompressed or not. The remaining 30 bits indicate the length of the data, so it can only support data up to 1GB (2^30 - 1 bytes).
-------------------------------------------------- tag | compress | length | -------------------------------------------------- 1 bit | 1 bit | 30 bit | ------------------------------------------------
Varattrib_1b_e Type
Typed structure { uint8 va_header; uint8 va_tag; /* type */ char va_data[FLEXIBLE_ARRAY_MEMBER]; } varattrib_1b_e;The second byte va_tag indicates the type, of which there are four below. The format of its va_data storage is not the same under each type.
typedef enum vartag_external { VARTAG_INDIRECT = 1, VARTAG_EXPANDED_RO = 2, VARTAG_EXPANDED_RW = 3, VARTAG_ONDISK = 18 } vartag_external;External data stored on disk
External data stored on disk
If it is of type VARTAG_ONDISK, it indicates that the external data is stored on disk. The format of the va_data store is defined as follows.
typedef struct varatt_external { int32 va_rawsize; /* Original data size (includes header) */ int32 va_extsize; /* External saved size (doesn't) */ Oid va_valueid; /* Unique ID of value within TOAST table */ Oid va_toastrelid; /* RelID of TOAST table containing it */ } varatt_external;
External data stored in memory
typedef struct varatt_expanded { ExpandedObjectHeader *eohptr; // pointers } varatt_expanded;
Pointer Types
There remains a special format, VARTAG_INDIRECT, which is simply a varlena pointer that can point to raw data of type varatt_external, varatt_expanded, or varattrib_1b, varattrib_4b.
typedef struct varatt_indirect { struct varlena *pointer; /* Pointer to in-memory varlena */ } varatt_indirect;
Utilization
The following shows an example of creating varlena data
result = (struct varlena *) palloc(length + VARHDRSZ); // Allocate heap memory SET_VARSIZE(result, length + VARHDRSZ); // Set the header memcpy(VARDATA(result), mydata, length); // Write data
Design Ideas
postgresql designed varlena structure, mainly to solve the problem of cstring. We know that cstring can only be scanned from the beginning to the end to know the length, this efficiency is very low. varlena in order to support different sizes of data, but also to avoid the waste of space, so for small data, the length of the information and marking bits are fused in a byte. Because the external data format just stores a pointer, and the length of the data can be determined, so the first byte does not need to store the length information, using 1000 0000 just as a marker. The rest of the format is summarized in the varattrib_1b type, and the data length must be greater than 0, so there is no conflict.
For large data, the length information takes up more bits to represent, so postgresql uses four bytes to store it. To avoid waste, it uses the first two bits as marker bits. As you can see, postgresql's design of the data format is very subtle, and we can learn more about it.